Search CORE

37 research outputs found

Testing word embeddings for Polish

Author: Agnieszka Mykowiecka
Małgorzata Marciniak
Piotr Rychlik
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date: 01/01/2017
Field of study

Testing word embeddings for Polish Distributional Semantics postulates the representation of word meaning in the form of numeric vectors which represent words which occur in context in large text data. This paper addresses the problem of constructing such models for the Polish language. The paper compares the effectiveness of models based on lemmas and forms created with Continuous Bag of Words (CBOW) and skip-gram approaches based on different Polish corpora. For the purposes of this comparison, the results of two typical tasks solved with the help of distributional semantics, i.e. synonymy and analogy recognition, are compared. The results show that it is not possible to identify one universal approach to vector creation applicable to various tasks. The most important feature is the quality and size of the data, but different strategy choices can also lead to significantly different results. Testowanie wektorowych reprezentacji dystrybucyjnych słów języka polskiego Semantyka dystrybucyjna opiera się na założeniu, że znaczenie słów wyrażone jest za pomocą wektorów reprezentujących, w sposób bezpośredni bądź pośredni, konteksty, w jakich słowo to jest używane w dużym zbiorze tekstów. Niniejszy artykuł dotyczy ewaluacji wielu takich modeli skonstruowanych dla języka polskiego. W pracy porównano skuteczność modeli opartych na lematach i formach słów, utworzonych przy wykorzystaniu sieci neuronowych na danych z dwóch różnych korpusów języka polskiego. Ewaluacji dokonano na podstawie wyników dwóch typowych zadań rozwiązywanych za pomocą metod semantyki dystrybucyjnej, tzn. rozpoznania występowania synonimii i analogii między konkretnymi parami słów. Uzyskane wyniki dowodzą, że nie można wskazać jednego uniwersalnego podejścia do tworzenia modeli dystrybucyjnych, gdyż ich skuteczność jest różna w zależności od zastosowania. Najważniejszą cechą wpływającą na jakość modelu jest jakość oraz rozmiar danych, ale wybory różnych strategii uczenia sieci mogą również prowadzić do istotnie odmiennych wyników

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals

Wspomnienie o Doktor Czesławie Leszczyk (1932–2016)

Author: Gałecki Jacek
Mykowiecka Agnieszka
Publication venue: 'Salvia Medical Sciences Ltd'
Publication date: 25/08/2017
Field of study

Via Medica Journals

Simultaneous Reconstruction of Duplication Episodes and Gene-Species Mappings

Author: Mykowiecka Agnieszka
Paszek Jaros?aw
Rutecka Natalia
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Publication date: 01/01/2023
Field of study

We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of gene trees with missing labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. We then demonstrate how to use DP to design an algorithm that solves MetaEC. Although the algorithm is exponential in the worst case, we introduce a heuristic modification of the algorithm that provides a solution with the knowledge that it is exact. To evaluate our method, we perform two computational experiments on simulated and empirical data containing whole genome duplication events, showing that our algorithm is able to accurately infer the corresponding events

Dagstuhl Research Online Publication Server

Conflict Resolution Algorithms for Deep Coalescence Phylogenetic Networks

Author: D?bkowski Dawid
Mykowiecka Agnieszka
Rutecka Natalia
Wawerka Marcin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)
Publication date: 01/01/2021
Field of study

We address the problem of inferring an optimal tree displayed by a network, given a gene tree G and a tree-child network N, under the deep coalescence cost. We propose an O(|G||N|)-time dynamic programming algorithm (DP) to compute a lower bound of the optimal displayed tree cost, where |G| and |N| are the sizes of G and N, respectively. This algorithm has the ability to state whether the cost is exact or is a lower bound. In addition, our algorithm provides a set of reticulation edges that correspond to the obtained cost. If the cost is exact, the set induces an optimal displayed tree that yields the cost. If the cost is a lower bound, the set contains pairs of conflicting edges, that is, edges sharing a reticulation node. Next, we show a conflict resolution algorithm that requires 2^{r+1}-1 invocations of DP in the worst case, where r is a number of reticulations. We propose a similar O(2^k|G||N|)-time algorithm for level-k networks and a branch and bound solution to compute lower and upper bounds of optimal costs. We also show how our algorithms can be extended to a broader class of phylogenetic networks. Despite their exponential complexity in the worst case, our solutions perform significantly well on empirical and simulated datasets, thanks to the strategy of resolving internal dissimilarities between gene trees and networks. In particular, experiments on simulated data indicate that the runtime of our solution is ?(2^{0.543 k}|G||N|) on average. Therefore, our solution is an efficient alternative to enumeration strategies commonly proposed in the literature and enables analyses of complex networks with dozens of reticulations

Dagstuhl Research Online Publication Server

A word from the editors

Author: Adam Przepiórkowski
Agnieszka Mykowiecka
Elżbieta Hajnicz
Marcin Woliński
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date: 01/06/2015
Field of study

(no abstract

Directory of Open Access Journals

Clustering of human liver proteins according to their taxonomic profiles

Author: Agnieszka Mykowiecka (98993)
Publication venue
Publication date
Field of study

<p>liver protein clusters</p

FigShare